Cognateness, frequency, and vocabulary size

An interactive account of bilingual lexical acquisition

Gonzalo Garcia-Castro
Daniela S. Ávila-Varela
Ignacio Castillejo
Núria Sebastian-Galles

Bilingual word acquisition


Word acquisition starts around 6 months of age (Jusczyk and Aslin 1995; Tincoff and Jusczyk 1999; Bergelson and Swingley 2012)

Word-learning involves the (challenging) task of associating a word-form to its referential context (ambiguous, variable)

Bilinguals face the challenge of learning more than one word-form per referent

Bilinguals keep up with their monolinguals: how?

Learning outputs: measuring vocabulary size

Vocabulary checklist: number/proportion of words checked by caregivers as Understands, and/or Says

Understands Understands & Says
chair [ ] [ ]
table [ ] [ ]
[ ] [ ]


English-Spanish bilinguals have smaller English vocabulary sizes, compared to monolinguals, but similar vocabulary sizes when both language are summer together (Hoff et al. 2012)

Linguistic distance


Bilingual toddlers learning two typologically close languages showed larger vocabulary sizes (Floccia et al. 2018)

Cognate: form-similar translation equivalents (TEs)

Cognate Non-cognate
[cat] /ˈgat-ˈgato/ [dog] /ˈgos-ˈpe.ro/

Bilinguals acquire TEs from early steps of vocabulary growth (Bilson et al. 2015; Tsui et al. 2022)

Cognateness facilitates vocabulary growth? Mechanisms?

Parallel activation: candidate mechanism?

Lexical access is language non-selective:

Translation equivalents are co-activated, even in monolingual situations

Cognates are acquired earlier than non-cognates (Mitchell, Tsui, and Byers-Heinlein 2022; Bosch and Ramon-Casas 2014)

Dissociation between models of bilingual word processing (parallel activation) and word acquisition

:::

Simulating word acquisition

Accumulator models


Word acquisition as a continuous process of lexical consolidation (Hidaka 2013; Mollica and Piantadosi 2017)

{width=14in, fig-align:center}


For participant \(i\) and word \(j\):

\[ \begin{aligned} \text{Learning instances}_{ij} &= \text{Age}_i \cdot \text{Frequency}_j \\ \text{Frequency}_j &\sim \text{Poisson}(\lambda) \end{aligned} \]

\[ \begin{aligned} \text{Age of acquisition}_{ij} &= \text{minimize}(|\text{Threshold}_{ij}-\text{Learning instances}_{ij}|) \end{aligned} \]

We fix some parameters:

\[ \begin{aligned} \text{Threshold} &= 250 \\ \lambda &= 50 \end{aligned} \]

Simulating word acquisition

Catalan monolingual (no parallel activation)

Catalan Spanish
100% 0%

Including learning instances from parallel activation

\[ \begin{aligned} \text{Learning instances}_{ij} &= Age_i \cdot Frequency_j + \\ &(Similarity_j \cdot \text{Learning instances}_{ij'}) \end{aligned} \]

Catalan-Spanish bilingual (no parallel activation)

Catalan Spanish
60% 40%

Catalan-Spanish bilingual (parallel activation)

Catalan Spanish
75% 25%

Methods

Questionnaire


  • On-line (formr, Arslan, Walther, and Tata 2020), inspired in MacArthur-Bates CDI (Fenson et al. 1994)

  • ~1,600 items/words (800 Catalan + 800 Spanish)

  • Participants filled one of four versions of the questionnaire:

  • 500 items: 250 Catalan + 250 Spanish

  • Short-listed (nouns): 302 translation equivalents (TE)

Participants

138,078 item responses from 366 participants

1 time 2 times 3 times 4 times
312 42 8 4

Modelling

Model structure


Ordinal regression model: \(P(Understands)\), \(P(Says)\)

  • No < Understands < Understands and Says

Multilevel: Crossed-random effects

  • Participant and Translation equivalent as grouping variables

Bayesian: probability of parameter values

\[P(\text{model} | \text{data}) \propto P(\text{data} | \text{model}) \times P(\text{model})\]

Predictors

Predictor Example
Age Months
Length Number of phonemes
Exposure Lexical frequency \(\times\) Language exposure
Cognateness Levenshtein similarity between a word-form and its translation
Two-way and three-way interactions between age, exposure, and cognateness

Results

Posterior distribution

Predictor Estimate 95% HDI p(H0)
Intercepts
Comprehension and Production 0.438 [-0.5, 0.5] 0.088
Comprehension 0.936 [2.44, 0.95] 0.000
Slopes
Age (+1 SD, 4.87, months) 0.405 [1.43, 0.45] 0.000
Exposure (+1 SD, 1.81) 0.233 [0.8, 0.27] 0.000
Cognateness (+1 SD, 25.65%) 0.058 [0.06, 0.1] 0.037
Length (+1 SD, 1.56 phonemes) -0.062 [-0.35, -0.04] 0.000
Age × Exposure 0.071 [0.16, 0.1] 0.000
Age × Cognateness 0.014 [0, 0.03] 0.985
Exposure × Cognateness -0.057 [-0.28, -0.05] 0.000
Age × Exposure × Cognateness -0.018 [-0.11, -0.01] 0.975

Posterior predictions

Discussion

Cognateness facilitates word acquisition

Only low-exposure words benefit from their cognate status: less dominant language receives more facilitation

Parallel activation as mechanism that boosts lexical consolidation: increment in cumulative learning instances

Catalan-Spanish: very specific population

Next steps: word-learning, formalisation

Appendix

Item properties

Levenshtein similarity

Phonological similarity

Levenshtein distance: number of edits for two character strings to become identical

Orthography Phonology String
Catalan porta /ˈpɔɾ.tə/ pɔɾtə
Spanish puerta /ˈpweɾ.ta/ pweɾta

Levenshtein similarity

\[ 1-\frac{lev(A, B)}{Max(length(A), length(B))} \]

Catalan Spanish Levenshtein
porta (/ˈpɔɾ.tə/) puerta (/ˈpweɾ.ta/) 0.50 (3)
taula (/ˈtaw.lə/) mesa (/ˈmesa/) 0.00 (5)
cotxe (/ˈkɔ.t͡ʃə/) coche (/ˈkot͡ʃe/) 0.40 (3)

References

Arslan, Ruben C., Matthias P. Walther, and Cyril S. Tata. 2020. “Formr: A Study Framework Allowing for Automated Feedback Generation and Complex Longitudinal Experience-Sampling Studies Using R.” Behavior Research Methods 52 (1): 376–87. https://doi.org/10.3758/s13428-019-01236-y.
Bergelson, Elika, and Daniel Swingley. 2012. “At 6–9 Months, Human Infants Know the Meanings of Many Common Nouns.” Proceedings of the National Academy of Sciences 109 (9): 3253–58. https://doi.org/10.1073/pnas.1113380109.
Bilson, Samuel, Hanako Yoshida, Crystal D Tran, Elizabeth A Woods, and Thomas T Hills. 2015. “Semantic Facilitation in Bilingual First Language Acquisition.” Cognition 140: 122–34.
Bosch, Laura, and Marta Ramon-Casas. 2014. “First Translation Equivalents in Bilingual Toddlers’ Expressive Vocabulary: Does Form Similarity Matter?” International Journal of Behavioral Development 38 (4): 317–22. https://doi.org/10.1177/0165025414532559.
Fenson, Larry, Philip S. Dale, J. Steven Reznick, Elizabeth Bates, Donna J. Thal, Stephen J. Pethick, Michael Tomasello, Carolyn B. Mervis, and Joan Stiles. 1994. “Variability in Early Communicative Development.” Monographs of the Society for Research in Child Development 59 (5): i–185. https://doi.org/10.2307/1166093.
Floccia, Caroline, Thomas D. Sambrook, Claire Delle Luche, Rosa Kwok, Jeremy Goslin, Laurence White, Allegra Cattani, et al. 2018. “I: Introduction.” Monographs of the Society for Research in Child Development 83 (1): 7–29. https://doi.org/10.1111/mono.12348.
Hidaka, Shohei. 2013. “A Computational Model Associating Learning Process, Word Attributes, and Age of Acquisition.” PLOS ONE 8 (11): e76242. https://doi.org/10.1371/journal.pone.0076242.
Hoff, Erika, Cynthia Core, Silvia Place, Rosario Rumiche, Melissa Señor, and Marisol Parra. 2012. “Dual Language Exposure and Early Bilingual Development*.” Journal of Child Language 39 (1): 1–27. https://doi.org/10.1017/S0305000910000759.
Jusczyk, P. W., and R. N. Aslin. 1995. “Infants′ Detection of the Sound Patterns of Words in Fluent Speech.” Cognitive Psychology 29 (1): 1–23. https://doi.org/10.1006/cogp.1995.1010.
Mitchell, Lori, Rachel K. Y. Tsui, and Krista Byers-Heinlein. 2022. “Cognates Are Advantaged in Early Bilingual Expressive Vocabulary Development.” PsyArXiv. https://doi.org/10.31234/osf.io/daktp.
Mollica, Francis, and Steven T. Piantadosi. 2017. “How Data Drive Early Word Learning: A Cross-Linguistic Waiting Time Analysis.” Open Mind 1 (2): 67–77. https://doi.org/10.1162/OPMI_a_00006.
Tincoff, Ruth, and Peter W. Jusczyk. 1999. “Some Beginnings of Word Comprehension in 6-Month-Olds.” Psychological Science 10 (2): 172–75. https://doi.org/10.1111/1467-9280.00127.
Tsui, Rachel Ka-Ying, Ana Maria Gonzalez-Barrero, Esther Schott, and Krista Byers-Heinlein. 2022. “Are Translation Equivalents Special? Evidence from Simulations and Empirical Data from Bilingual Infants.” Cognition 225 (August): 105084. https://doi.org/10.1016/j.cognition.2022.105084.

Back to main